The goal for this analysis is to create a model that can predict which students are consuming dangerous amounts of alcohol. We will first perform an analysis on all of the features, then use this information to build our model. Hopefully we will be able to create some useful features and find some strong relationships to build our predictive model.

## 'data.frame':    649 obs. of  33 variables:
##  $ school    : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sex       : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
##  $ age       : int  18 17 15 15 16 16 16 17 15 15 ...
##  $ address   : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
##  $ famsize   : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
##  $ Pstatus   : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
##  $ Medu      : int  4 1 1 4 3 4 2 4 3 3 ...
##  $ Fedu      : int  4 1 1 2 3 3 2 4 2 4 ...
##  $ Mjob      : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
##  $ Fjob      : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
##  $ reason    : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
##  $ guardian  : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
##  $ traveltime: int  2 1 1 1 1 1 1 2 1 1 ...
##  $ studytime : int  2 2 2 3 2 2 2 2 2 2 ...
##  $ failures  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
##  $ famsup    : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
##  $ paid      : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
##  $ nursery   : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
##  $ higher    : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ internet  : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
##  $ romantic  : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
##  $ famrel    : int  4 5 4 3 4 5 4 4 4 5 ...
##  $ freetime  : int  3 3 3 2 3 4 4 1 2 5 ...
##  $ goout     : int  4 3 2 2 2 2 4 4 2 1 ...
##  $ Dalc      : int  1 1 2 1 1 1 1 1 1 1 ...
##  $ Walc      : int  1 1 3 1 2 2 1 1 1 1 ...
##  $ health    : int  3 3 3 5 5 5 3 1 1 5 ...
##  $ absences  : int  4 2 6 0 0 6 0 2 0 0 ...
##  $ G1        : int  0 9 12 14 11 12 13 10 15 12 ...
##  $ G2        : int  11 11 13 14 13 12 12 13 16 12 ...
##  $ G3        : int  11 11 12 14 13 13 13 13 17 13 ...

It’s a rather small dataset that we are working with, and I’m worried that the data is not optimally reflective of the average student. We are only looking at students who have taken a Portugese class, an elective course, for which there might be a ‘type’ of student that takes this class. I expect that we would have a more accurate sense of the average student if we had data from a mandatory class, such as math or English. Nonetheless, I hope we will have some interesting and useful findings.

Univariate Analysis

## [1] "Percent of students in each group:"
## 
##          1          2          3          4          5 
## 0.69491525 0.18644068 0.06625578 0.02619414 0.02619414

The majority of the students consume very little, if any, alcohol during the week, but about 5% of students drink significant amounts (values >= 4).

## [1] "Percent of students in each group:"
## 
##          1          2          3          4          5 
## 0.38058552 0.23112481 0.18489985 0.13405239 0.06933744

Clearly students are drinking much more alcohol on weekends compared to weekdays. The percent of signifiant drinkers jumped from about 5% to just over 20%. Those drinking little to no alcohol (value = 1) was reduced by nearly half.

## [1] "Correlation between weekday and weekend:"
## [1] 0.6165614

Of the 649 students, 241 (or 37.1%) of them drink little to no alcohol. 210 of the 451 (46.6%) students who do not drink during the week, consume some alcohol on the weekends. It is very rarely the case that students drink more during the week than on weekends.

## [1] "Percent of students in each group:"
## 
##           2           3           4           5           6           7 
## 0.371340524 0.178736518 0.152542373 0.112480740 0.077041602 0.049306626 
##           8           9          10 
## 0.026194145 0.009244992 0.023112481

For the remainder of this analysis, we are going to be dividing the students into one of three groups. ‘Low Risk’ for students with total alcohol consumption values <= 3, ‘Medium Risk’ for values <= 7, and ‘High Risk’ for values > 7 or having either the ‘Weekday’ or ‘Weekend’ value equal to 5.

## [1] "Percent of students in each risk group:"
## 
##        Low     Medium       High 
## 0.55007704 0.36979969 0.08012327

Although it is good that the majority of the students are in the low risk group, it will be very important to build a model that can accurately predict which students are in the high risk group, so that they can receive the guidance they need to stop this detrimental behaviour.

Bivariate Analysis

## [1] "Number of students attending each school"
## 
##      Gabriel Pereira Mousinho da Silveira 
##                  423                  226
## 
##      Gabriel Pereira Mousinho da Silveira 
##             0.651772             0.348228

Althought there are more students attending Gabriel Pereira, the relative number of students in each risk group is about the same.

## [1] "Number of students of each gender:"
## 
## Female   Male 
##    383    266
## 
##    Female      Male 
## 0.5901387 0.4098613

Despite 59% of the students being female, 81% of those in the high risk group are males.

## [1] "Number of students for each age:"
## 
##  15  16  17  18  19  20  21  22 
## 112 177 179 140  32   6   2   1
## 
##          15          16          17          18          19          20 
## 0.172573190 0.272727273 0.275808937 0.215716487 0.049306626 0.009244992 
##          21          22 
## 0.003081664 0.001540832
## [1] "Correlation between age and risk"
## [1] 0.1048394

Although there are fewer older students to make this conclusion with, it seems that as students age, they drink more.

## [1] "Number of students of each type of address:"
## 
## Rural Urban 
##   197   452
## 
##     Rural     Urban 
## 0.3035439 0.6964561

Many more students live in urban areas, but this does not tell us anything about the risk of alcohol abuse.

## [1] "Number of students of each family size:"
## 
## Greater than 3    Less than 3 
##            457            192
## 
## Greater than 3    Less than 3 
##      0.7041602      0.2958398

Students from smaller families are more likely to be at risk of alcohol abuse.

## [1] "Number of students of each parental marriage status group:"
## 
##    Apart Together 
##       80      569
## 
##     Apart  Together 
## 0.1232666 0.8767334

I am a bit surprised to see that students in the higher risk groups are more likely to have parents living together.

## [1] "Number of students for each level of education (Mother):"
## 
##                None           4th Grade    5th to 9th Grade 
##                   6                 143                 186 
## Secondary Education    Higher Education 
##                 139                 175
## 
##                None           4th Grade    5th to 9th Grade 
##         0.009244992         0.220338983         0.286594761 
## Secondary Education    Higher Education 
##         0.214175655         0.269645609

There doesn’t appear to be any relationship between a mother’s education level and their child’s drinking habits.

## [1] "Number of students for each level of education (Father):"
## 
##                None           4th Grade    5th to 9th Grade 
##                   7                 174                 209 
## Secondary Education    Higher Education 
##                 131                 128
## 
##                None           4th Grade    5th to 9th Grade 
##          0.01078582          0.26810478          0.32203390 
## Secondary Education    Higher Education 
##          0.20184900          0.19722650

Much like with the mother’s education level, there doesn’t appear to be any relationship between a father’s education level and their child’s drinking habits.

## [1] "Correlation between mothers' and fathers' education levels:"
## [1] 0.6474766

There is a reasonably strong correlation between a mother’s and a father’s education level. This helps to explain the similarities in the plots we just saw.

## [1] "Number of students for each type of job (Mother):"
## 
##  At_Home   Health    Other Services  Teacher 
##      135       48      258      136       72
## 
##    At_Home     Health      Other   Services    Teacher 
## 0.20801233 0.07395994 0.39753467 0.20955316 0.11093991

Although ‘At_Home’ and ‘Teacher’ have a larger share of the high risk students, it’s a small increase. Remember, there are only 52 students in the high risk group, so a change of nearly 2% can be attributed to just one student.

## [1] "Number of students for each type of job (Father):"
## 
##  At_Home   Health    Other Services  Teacher 
##       42       23      367      181       36
## 
##    At_Home     Health      Other   Services    Teacher 
## 0.06471495 0.03543914 0.56548536 0.27889060 0.05546995

This looks more significant. If a father works in services, it seems that their child is more likely to abuse alcohol.

## [1] "Number of students for each type of reason:"
## 
## Course Perference     Close to Home             Other School Reputation 
##               285               149                72               143
## 
## Course Perference     Close to Home             Other School Reputation 
##         0.4391371         0.2295840         0.1109399         0.2203390

If the school was chosen based on its reputation, the student appears less likely to abuse alcohol.

## [1] "Number of students for each type of guardian:"
## 
## Father Mother  Other 
##    153    455     41
## 
##     Father     Mother      Other 
## 0.23574730 0.70107858 0.06317411

It is interesting to see that so many students chose their mother as their primary guardian, yet so few students having separated parents. Despite the ratios being similar for each risk group, if the guardian is ‘Other,’ the student is more likely to be in a higher risk group.

## [1] "Number of students for each travel time group:"
## 
## Less than 15     15 to 30     30 to 60 More than 60 
##          366          213           54           16
## 
## Less than 15     15 to 30     30 to 60 More than 60 
##   0.56394453   0.32819723   0.08320493   0.02465331
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.538   2.000   4.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.562   2.000   4.000 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.808   2.000   4.000
## [1] "Correlation between travel time and risk"
## [1] 0.07537328

We can see from the mean values, that as travel time increaseses, students are more likely to abuse alcohol. However, the relationship is not overly strong as seen by the median, 3rd quartile values, and correlation.

## [1] "Number of students for each study time group:"
## 
##  Less than 2 Hours       2 to 5 Hours      5 to 10 Hours 
##                212                305                 97 
## More than 10 Hours 
##                 35
## 
##  Less than 2 Hours       2 to 5 Hours      5 to 10 Hours 
##         0.32665639         0.46995378         0.14946071 
## More than 10 Hours 
##         0.05392912
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   2.048   2.000   4.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   1.825   2.000   4.000 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.615   2.000   4.000
## [1] "Correlation between time spent studying and risk:"
## [1] -0.1689503

There is a reasonable relationship here. If a student spends less time studying, s/he is more likely to abuse alcohol. Note: The values for ‘Time Spent Studying’ (1,2,3,4) numerically represent ‘Less than 2 Hours’, ‘2 to 5 Hours’, ‘5 to 10 Hours’, and ‘More than 10 Hours’.

## [1] "Number of students for each failure group:"
## 
##   0   1   2   3 
## 549  70  16  14
## 
##          0          1          2          3 
## 0.84591680 0.10785824 0.02465331 0.02157165
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.1709  0.0000  3.0000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00    0.25    0.00    3.00 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.4423  0.2500  3.0000
## [1] "Correlation between number of failed classes and risk:"
## [1] 0.1205551

As a student fails more classes, it seems that they are more likely to abuse alcohol. To simplify the relationship, let’s look at students having failed at least one class versus their risk group.

## [1] "Number of students for each failed group:"
## 
##  No Yes 
## 549 100
## 
##        No       Yes 
## 0.8459168 0.1540832
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.123   1.000   2.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.179   1.000   2.000 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.00    1.00    1.25    1.25    2.00

This should help to make the relationship look even more clear. If a student has failed at least one class they are slightly more likely to abuse alcohol.

## [1] "Number of students for each educational support group:"
## 
##  No Yes 
## 581  68
## 
##        No       Yes 
## 0.8952234 0.1047766
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.00    1.00    1.12    1.00    2.00 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.092   1.000   2.000 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.058   1.000   2.000

If a student receives extra educational support, they are less likely to abuse alcohol.

## [1] "Number of students for each educational support group:"
## 
##  No Yes 
## 251 398
## 
##        No       Yes 
## 0.3867488 0.6132512
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   1.633   2.000   2.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   1.596   2.000   2.000 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   2.000   1.558   2.000   2.000

Oddly the trend is reversed here. Perhaps students do not like receiving educational support from their family and this can lead them to negatives behaviours…but that’s only a guess.

## [1] "Number of students for each paying group:"
## 
##  No Yes 
## 610  39
## 
##         No        Yes 
## 0.93990755 0.06009245

Paying for extra classes doesn’t change a student’s drinking habits.

## [1] "Number of students for each activity group:"
## 
##  No Yes 
## 334 315
## 
##        No       Yes 
## 0.5146379 0.4853621

Students in the high risk group are more likely to participate in extra-cirricular acitivities.

## [1] "Number of students for each nursery group:"
## 
##  no yes 
## 128 521
## 
##        no       yes 
## 0.1972265 0.8027735

Attending nursery school as a young child looks to decrease the likelihood of drinking excessive when older.

## [1] "Number of students for each education group:"
## 
##  No Yes 
##  69 580
## 
##        No       Yes 
## 0.1063174 0.8936826

Students that are less inclinded to attend higher education are more likely to drink excessive amounts of alcohol.

## [1] "Number of students for each internet group:"
## 
##  No Yes 
## 151 498
## 
##        No       Yes 
## 0.2326656 0.7673344

There doesn’t seem to be a relationship between alcohol consumption and internet access at home.

## [1] "Number of students for each relationship group:"
## 
##  No Yes 
## 410 239
## 
##        No       Yes 
## 0.6317411 0.3682589

No strong relationship between having a significant other and alcohol consumption.

## [1] "Number of students for each quality group:"
## 
##  Very Bad       Bad   Average      Good Excellent 
##        22        29       101       317       180
## 
##   Very Bad        Bad    Average       Good  Excellent 
## 0.03389831 0.04468413 0.15562404 0.48844376 0.27734977
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.000   4.000   4.025   5.000   5.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   4.000   3.825   4.000   5.000 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   4.000   3.769   5.000   5.000

Students in the high risk group, typically, have slightly worse family relationships.

## [1] "Number of students for each time group:"
## 
##  Very Low       Low   Average      High Very High 
##        45       107       251       178        68
## 
##   Very Low        Low    Average       High  Very High 
## 0.06933744 0.16486903 0.38674884 0.27426810 0.10477658
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   3.078   4.000   5.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     3.0     3.0     3.3     4.0     5.0 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   3.000   3.327   4.000   5.000
## [1] "Correlation between Amount of free time after school and risk:"
## [1] 0.1008569

There is a weak, but positive relationship between amount of free time after school and alcohol consumption. One of the main features of these plots is students in the high risk group are much more likely to have a ‘very high’ amount of free time.

## [1] "Number of students for each social group:"
## 
##  Very Low       Low   Average      High Very High 
##        48       145       205       141       110
## 
##   Very Low        Low    Average       High  Very High 
## 0.07395994 0.22342065 0.31587057 0.21725732 0.16949153
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.866   4.000   5.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   3.000   3.429   4.000   5.000 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    4.00    5.00    4.25    5.00    5.00
## [1] "Correlation between frequency of going out with friends and risk"
## [1] 0.3472357

This could be the most differentiating feature that we have seen yet. We can clearly see that students who go out with their friends more often are more likely to be in a high risk group.

## [1] "Number of students for each health group:"
## 
##  Very Bad       Bad  Mediocre      Good Very Good 
##        90        78       124       108       249
## 
##  Very Bad       Bad  Mediocre      Good Very Good 
## 0.1386749 0.1201849 0.1910632 0.1664099 0.3836672
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   4.000   3.415   5.000   5.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   4.000   3.646   5.000   5.000 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   4.000   3.865   5.000   5.000
## [1] "Correlation between health and risk:"
## [1] 0.1008952

We could be seeing the bias of a personal survey here. Despite having a very unhealthy habit, those in the high risk group consider themselves to be the healthiest.

## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   2.000   3.092   4.000  22.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   2.000   4.146   6.000  32.000 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   4.000   5.308   8.000  21.000
## [1] "Correlation between number of absences and risk:"
## [1] 0.1496446

Students in higher risk groups seem to miss more classes than those in the low risk group.

## [1] "Period 1"
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   10.00   12.00   11.81   14.00   19.00 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.00    9.00   11.00   11.01   13.00   18.00 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00    9.00   10.00   10.37   12.00   17.00
## [1] "Period 2"
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0      10      12      12      14      19 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    9.00   11.00   11.17   13.00   18.00 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    9.00   10.00   10.46   12.00   18.00
## [1] "Period 3"
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   11.00   13.00   12.45   14.00   19.00 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   10.00   11.00   11.41   13.00   19.00 
## -------------------------------------------------------- 
## df$Risk: High
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    9.75   11.00   10.44   12.00   17.00

Students in the high risk group have the worst grades on average.

## [1] "Period 2 grades minus period 1 grades"
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -9.0000 -1.0000  0.0000  0.1877  1.0000 11.0000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -8.0000 -1.0000  0.0000  0.1625  1.0000  4.0000 
## -------------------------------------------------------- 
## df$Risk: High
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -7.00000 -1.00000  0.00000  0.09615  1.00000  5.00000
## [1] "Correlation with risk:"
## [1] -0.01602408
## [1] "Period 3 grades minus period 2 grades"
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -7.000   0.000   0.000   0.451   1.000   3.000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -8.0000  0.0000  0.0000  0.2417  1.0000  3.0000 
## -------------------------------------------------------- 
## df$Risk: High
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -9.00000  0.00000  0.00000 -0.01923  1.00000  6.00000
## [1] "Correlation with risk:"
## [1] -0.1122831
## [1] "Period 3 grades minus period 1 grades"
## df$Risk: Low
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -9.0000  0.0000  1.0000  0.6387  2.0000 11.0000 
## -------------------------------------------------------- 
## df$Risk: Medium
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -8.0000  0.0000  1.0000  0.4042  1.0000  4.0000 
## -------------------------------------------------------- 
## df$Risk: High
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -11.00000  -0.25000   1.00000   0.07692   1.00000   6.00000
## [1] "Correlation with risk:"
## [1] -0.0918462

Multivariate Analysis

Note: There are a number of combinations of features that I compared, but I will only present the plots that show a stronger relationship.

There look to be two clusters in this plot. In the top left, for students who stay home and study more (low risk), and bottom right for students who go out more and study less (higher risk).

The main grouping that I see here is in the top right, which represents males who go out more often with their friends (higher risk).

The main cluster of interest is in the bottom right, for male students with below median grades (higher risk).

Building and Training the Models

## [1] "Perform recursive feature engineering."
## +(rfe) fit Fold1 size: 54 
## -(rfe) fit Fold1 size: 54 
## +(rfe) imp Fold1 
## -(rfe) imp Fold1 
## +(rfe) fit Fold1 size: 15 
## -(rfe) fit Fold1 size: 15 
## +(rfe) fit Fold1 size: 14 
## -(rfe) fit Fold1 size: 14 
## +(rfe) fit Fold1 size: 13 
## -(rfe) fit Fold1 size: 13 
## +(rfe) fit Fold1 size: 12 
## -(rfe) fit Fold1 size: 12 
## +(rfe) fit Fold1 size: 11 
## -(rfe) fit Fold1 size: 11 
## +(rfe) fit Fold1 size: 10 
## -(rfe) fit Fold1 size: 10 
## +(rfe) fit Fold1 size:  9 
## -(rfe) fit Fold1 size:  9 
## +(rfe) fit Fold1 size:  8 
## -(rfe) fit Fold1 size:  8 
## +(rfe) fit Fold1 size:  7 
## -(rfe) fit Fold1 size:  7 
## +(rfe) fit Fold1 size:  6 
## -(rfe) fit Fold1 size:  6 
## +(rfe) fit Fold1 size:  5 
## -(rfe) fit Fold1 size:  5 
## +(rfe) fit Fold1 size:  4 
## -(rfe) fit Fold1 size:  4 
## +(rfe) fit Fold1 size:  3 
## -(rfe) fit Fold1 size:  3 
## +(rfe) fit Fold1 size:  2 
## -(rfe) fit Fold1 size:  2 
## +(rfe) fit Fold1 size:  1 
## -(rfe) fit Fold1 size:  1 
## +(rfe) fit Fold2 size: 54 
## -(rfe) fit Fold2 size: 54 
## +(rfe) imp Fold2 
## -(rfe) imp Fold2 
## +(rfe) fit Fold2 size: 15 
## -(rfe) fit Fold2 size: 15 
## +(rfe) fit Fold2 size: 14 
## -(rfe) fit Fold2 size: 14 
## +(rfe) fit Fold2 size: 13 
## -(rfe) fit Fold2 size: 13 
## +(rfe) fit Fold2 size: 12 
## -(rfe) fit Fold2 size: 12 
## +(rfe) fit Fold2 size: 11 
## -(rfe) fit Fold2 size: 11 
## +(rfe) fit Fold2 size: 10 
## -(rfe) fit Fold2 size: 10 
## +(rfe) fit Fold2 size:  9 
## -(rfe) fit Fold2 size:  9 
## +(rfe) fit Fold2 size:  8 
## -(rfe) fit Fold2 size:  8 
## +(rfe) fit Fold2 size:  7 
## -(rfe) fit Fold2 size:  7 
## +(rfe) fit Fold2 size:  6 
## -(rfe) fit Fold2 size:  6 
## +(rfe) fit Fold2 size:  5 
## -(rfe) fit Fold2 size:  5 
## +(rfe) fit Fold2 size:  4 
## -(rfe) fit Fold2 size:  4 
## +(rfe) fit Fold2 size:  3 
## -(rfe) fit Fold2 size:  3 
## +(rfe) fit Fold2 size:  2 
## -(rfe) fit Fold2 size:  2 
## +(rfe) fit Fold2 size:  1 
## -(rfe) fit Fold2 size:  1 
## +(rfe) fit Fold3 size: 54 
## -(rfe) fit Fold3 size: 54 
## +(rfe) imp Fold3 
## -(rfe) imp Fold3 
## +(rfe) fit Fold3 size: 15 
## -(rfe) fit Fold3 size: 15 
## +(rfe) fit Fold3 size: 14 
## -(rfe) fit Fold3 size: 14 
## +(rfe) fit Fold3 size: 13 
## -(rfe) fit Fold3 size: 13 
## +(rfe) fit Fold3 size: 12 
## -(rfe) fit Fold3 size: 12 
## +(rfe) fit Fold3 size: 11 
## -(rfe) fit Fold3 size: 11 
## +(rfe) fit Fold3 size: 10 
## -(rfe) fit Fold3 size: 10 
## +(rfe) fit Fold3 size:  9 
## -(rfe) fit Fold3 size:  9 
## +(rfe) fit Fold3 size:  8 
## -(rfe) fit Fold3 size:  8 
## +(rfe) fit Fold3 size:  7 
## -(rfe) fit Fold3 size:  7 
## +(rfe) fit Fold3 size:  6 
## -(rfe) fit Fold3 size:  6 
## +(rfe) fit Fold3 size:  5 
## -(rfe) fit Fold3 size:  5 
## +(rfe) fit Fold3 size:  4 
## -(rfe) fit Fold3 size:  4 
## +(rfe) fit Fold3 size:  3 
## -(rfe) fit Fold3 size:  3 
## +(rfe) fit Fold3 size:  2 
## -(rfe) fit Fold3 size:  2 
## +(rfe) fit Fold3 size:  1 
## -(rfe) fit Fold3 size:  1 
## +(rfe) fit Fold4 size: 54 
## -(rfe) fit Fold4 size: 54 
## +(rfe) imp Fold4 
## -(rfe) imp Fold4 
## +(rfe) fit Fold4 size: 15 
## -(rfe) fit Fold4 size: 15 
## +(rfe) fit Fold4 size: 14 
## -(rfe) fit Fold4 size: 14 
## +(rfe) fit Fold4 size: 13 
## -(rfe) fit Fold4 size: 13 
## +(rfe) fit Fold4 size: 12 
## -(rfe) fit Fold4 size: 12 
## +(rfe) fit Fold4 size: 11 
## -(rfe) fit Fold4 size: 11 
## +(rfe) fit Fold4 size: 10 
## -(rfe) fit Fold4 size: 10 
## +(rfe) fit Fold4 size:  9 
## -(rfe) fit Fold4 size:  9 
## +(rfe) fit Fold4 size:  8 
## -(rfe) fit Fold4 size:  8 
## +(rfe) fit Fold4 size:  7 
## -(rfe) fit Fold4 size:  7 
## +(rfe) fit Fold4 size:  6 
## -(rfe) fit Fold4 size:  6 
## +(rfe) fit Fold4 size:  5 
## -(rfe) fit Fold4 size:  5 
## +(rfe) fit Fold4 size:  4 
## -(rfe) fit Fold4 size:  4 
## +(rfe) fit Fold4 size:  3 
## -(rfe) fit Fold4 size:  3 
## +(rfe) fit Fold4 size:  2 
## -(rfe) fit Fold4 size:  2 
## +(rfe) fit Fold4 size:  1 
## -(rfe) fit Fold4 size:  1 
## +(rfe) fit Fold5 size: 54 
## -(rfe) fit Fold5 size: 54 
## +(rfe) imp Fold5 
## -(rfe) imp Fold5 
## +(rfe) fit Fold5 size: 15 
## -(rfe) fit Fold5 size: 15 
## +(rfe) fit Fold5 size: 14 
## -(rfe) fit Fold5 size: 14 
## +(rfe) fit Fold5 size: 13 
## -(rfe) fit Fold5 size: 13 
## +(rfe) fit Fold5 size: 12 
## -(rfe) fit Fold5 size: 12 
## +(rfe) fit Fold5 size: 11 
## -(rfe) fit Fold5 size: 11 
## +(rfe) fit Fold5 size: 10 
## -(rfe) fit Fold5 size: 10 
## +(rfe) fit Fold5 size:  9 
## -(rfe) fit Fold5 size:  9 
## +(rfe) fit Fold5 size:  8 
## -(rfe) fit Fold5 size:  8 
## +(rfe) fit Fold5 size:  7 
## -(rfe) fit Fold5 size:  7 
## +(rfe) fit Fold5 size:  6 
## -(rfe) fit Fold5 size:  6 
## +(rfe) fit Fold5 size:  5 
## -(rfe) fit Fold5 size:  5 
## +(rfe) fit Fold5 size:  4 
## -(rfe) fit Fold5 size:  4 
## +(rfe) fit Fold5 size:  3 
## -(rfe) fit Fold5 size:  3 
## +(rfe) fit Fold5 size:  2 
## -(rfe) fit Fold5 size:  2 
## +(rfe) fit Fold5 size:  1 
## -(rfe) fit Fold5 size:  1
## 
## Recursive feature selection
## 
## Outer resampling method: Cross-Validated (5 fold) 
## 
## Resampling performance over subset size:
## 
##  Variables Accuracy  Kappa AccuracySD KappaSD Selected
##          1   0.5945 0.1849    0.04246 0.08858         
##          2   0.6118 0.2220    0.04154 0.08659         
##          3   0.6196 0.2344    0.04995 0.10169         
##          4   0.6253 0.2601    0.04054 0.07842        *
##          5   0.6099 0.2327    0.04242 0.08306         
##          6   0.6060 0.2238    0.04303 0.08783         
##          7   0.6214 0.2572    0.03960 0.07463         
##          8   0.6060 0.2313    0.04439 0.08490         
##          9   0.6002 0.2235    0.05477 0.11715         
##         10   0.5906 0.2015    0.05196 0.10989         
##         11   0.5733 0.1727    0.04642 0.10073         
##         12   0.5809 0.1851    0.07610 0.15816         
##         13   0.5789 0.1881    0.04626 0.09886         
##         14   0.5713 0.1676    0.05376 0.10764         
##         15   0.5828 0.1992    0.06014 0.11869         
##         54   0.6194 0.2492    0.02099 0.04005         
## 
## The top 4 variables (out of 4):
##    maleOut, simple, combos, sex
##  [1] "bestSubset"   "call"         "control"      "dots"        
##  [5] "fit"          "maximize"     "metric"       "obsLevels"   
##  [9] "optsize"      "optVariables" "perfNames"    "pred"        
## [13] "resample"     "resampledCM"  "results"      "times"       
## [17] "variables"
## [1] "maleOut" "simple"  "combos"  "sex"
## [1] "Features ranked by importance:"
##      simple     maleOut      combos       goout         sex gooutSimple 
##   15.446790    9.572997    8.026173    7.787614    5.507151    5.409034 
##  gooutStudy 
##    3.161892

Train the random forest model.

## + Fold01: mtry=1 
## - Fold01: mtry=1 
## + Fold02: mtry=1 
## - Fold02: mtry=1 
## + Fold03: mtry=1 
## - Fold03: mtry=1 
## + Fold04: mtry=1 
## - Fold04: mtry=1 
## + Fold05: mtry=1 
## - Fold05: mtry=1 
## + Fold06: mtry=1 
## - Fold06: mtry=1 
## + Fold07: mtry=1 
## - Fold07: mtry=1 
## + Fold08: mtry=1 
## - Fold08: mtry=1 
## + Fold09: mtry=1 
## - Fold09: mtry=1 
## + Fold10: mtry=1 
## - Fold10: mtry=1 
## + Fold11: mtry=1 
## - Fold11: mtry=1 
## + Fold12: mtry=1 
## - Fold12: mtry=1 
## + Fold13: mtry=1 
## - Fold13: mtry=1 
## + Fold14: mtry=1 
## - Fold14: mtry=1 
## + Fold15: mtry=1 
## - Fold15: mtry=1 
## Aggregating results
## Fitting final model on full training set

Train the K-Nearest Neighbours model.

## + Fold1: k=21 
## - Fold1: k=21 
## + Fold2: k=21 
## - Fold2: k=21 
## + Fold3: k=21 
## - Fold3: k=21 
## + Fold4: k=21 
## - Fold4: k=21 
## + Fold5: k=21 
## - Fold5: k=21 
## Aggregating results
## Fitting final model on full training set
## + Fold01: sigma=5, C=5, Weight=1 
## - Fold01: sigma=5, C=5, Weight=1 
## + Fold02: sigma=5, C=5, Weight=1 
## - Fold02: sigma=5, C=5, Weight=1 
## + Fold03: sigma=5, C=5, Weight=1 
## - Fold03: sigma=5, C=5, Weight=1 
## + Fold04: sigma=5, C=5, Weight=1 
## - Fold04: sigma=5, C=5, Weight=1 
## + Fold05: sigma=5, C=5, Weight=1 
## - Fold05: sigma=5, C=5, Weight=1 
## + Fold06: sigma=5, C=5, Weight=1 
## - Fold06: sigma=5, C=5, Weight=1 
## + Fold07: sigma=5, C=5, Weight=1 
## - Fold07: sigma=5, C=5, Weight=1 
## + Fold08: sigma=5, C=5, Weight=1 
## - Fold08: sigma=5, C=5, Weight=1 
## + Fold09: sigma=5, C=5, Weight=1 
## - Fold09: sigma=5, C=5, Weight=1 
## + Fold10: sigma=5, C=5, Weight=1 
## - Fold10: sigma=5, C=5, Weight=1 
## Aggregating results
## Fitting final model on full training set

Train the extreme gradient boosting model.

## + Fold01: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold01: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## + Fold02: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold02: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## + Fold03: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold03: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## + Fold04: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold04: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## + Fold05: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold05: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## + Fold06: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold06: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## + Fold07: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold07: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## + Fold08: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold08: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## + Fold09: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold09: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## + Fold10: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## - Fold10: nrounds=150, max_depth=3, eta=0.1, gamma=0.1, colsample_bytree=1, min_child_weight=0.8, subsample=1 
## Aggregating results
## Fitting final model on full training set
## Cross-Validated (15 fold) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction  Low Medium High
##     Low    41.9   18.3  1.3
##     Medium  9.2   10.4  1.5
##     High    3.8    8.3  5.2
##                            
##  Accuracy (average) : 0.575

Not particularly good scores. Although the ‘High’ group is classified reasonably well, the ‘Medium’ was predicted poorly.

## Cross-Validated (5 fold) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction  Low Medium High
##     Low    49.2   23.7  1.9
##     Medium  4.8   11.5  3.5
##     High    1.0    1.7  2.7
##                             
##  Accuracy (average) : 0.6346

Although the accuracy is higher, I am more concerned about properly classifying the higher risk students, therefore the knn model did worse than the random forest.

## Cross-Validated (10 fold) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction  Low Medium High
##     Low    46.3   23.3  1.7
##     Medium  7.5   11.7  3.8
##     High    1.2    1.9  2.5
##                             
##  Accuracy (average) : 0.6058
## Cross-Validated (10 fold) Confusion Matrix 
## 
## (entries are percentual average cell counts across resamples)
##  
##           Reference
## Prediction  Low Medium High
##     Low    48.7   23.8  1.9
##     Medium  5.2   10.2  3.8
##     High    1.2    2.9  2.3
##                             
##  Accuracy (average) : 0.6115

Very poor predictions for the higher risk groups.

## [1] "Random Forest Model:"
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Low Medium High
##     Low     60     34    4
##     Medium   8      2    0
##     High     3     12    6
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5271          
##                  95% CI : (0.4374, 0.6156)
##     No Information Rate : 0.5504          
##     P-Value [Acc > NIR] : 0.7327          
##                                           
##                   Kappa : 0.125           
##  Mcnemar's Test P-Value : 3.237e-06       
## 
## Statistics by Class:
## 
##                      Class: Low Class: Medium Class: High
## Sensitivity              0.8451       0.04167     0.60000
## Specificity              0.3448       0.90123     0.87395
## Pos Pred Value           0.6122       0.20000     0.28571
## Neg Pred Value           0.6452       0.61345     0.96296
## Prevalence               0.5504       0.37209     0.07752
## Detection Rate           0.4651       0.01550     0.04651
## Detection Prevalence     0.7597       0.07752     0.16279
## Balanced Accuracy        0.5949       0.47145     0.73697
## [1] "K-Nearest Neighbour Model:"
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Low Medium High
##     Low     63     35    4
##     Medium   8     12    5
##     High     0      1    1
## 
## Overall Statistics
##                                          
##                Accuracy : 0.5891         
##                  95% CI : (0.4991, 0.675)
##     No Information Rate : 0.5504         
##     P-Value [Acc > NIR] : 0.2133         
##                                          
##                   Kappa : 0.1641         
##  Mcnemar's Test P-Value : 2.998e-05      
## 
## Statistics by Class:
## 
##                      Class: Low Class: Medium Class: High
## Sensitivity              0.8873       0.25000    0.100000
## Specificity              0.3276       0.83951    0.991597
## Pos Pred Value           0.6176       0.48000    0.500000
## Neg Pred Value           0.7037       0.65385    0.929134
## Prevalence               0.5504       0.37209    0.077519
## Detection Rate           0.4884       0.09302    0.007752
## Detection Prevalence     0.7907       0.19380    0.015504
## Balanced Accuracy        0.6075       0.54475    0.545798
## [1] "Support Vector Machines Model:"
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Low Medium High
##     Low     60     35    4
##     Medium  10     12    5
##     High     1      1    1
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5659          
##                  95% CI : (0.4758, 0.6529)
##     No Information Rate : 0.5504          
##     P-Value [Acc > NIR] : 0.3964489       
##                                           
##                   Kappa : 0.1282          
##  Mcnemar's Test P-Value : 0.0003715       
## 
## Statistics by Class:
## 
##                      Class: Low Class: Medium Class: High
## Sensitivity              0.8451       0.25000    0.100000
## Specificity              0.3276       0.81481    0.983193
## Pos Pred Value           0.6061       0.44444    0.333333
## Neg Pred Value           0.6333       0.64706    0.928571
## Prevalence               0.5504       0.37209    0.077519
## Detection Rate           0.4651       0.09302    0.007752
## Detection Prevalence     0.7674       0.20930    0.023256
## Balanced Accuracy        0.5863       0.53241    0.541597
## [1] "Extreme Gradient Boosting Model:"
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Low Medium High
##     Low     58     32    1
##     Medium  13     15    8
##     High     0      1    1
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5736          
##                  95% CI : (0.4836, 0.6603)
##     No Information Rate : 0.5504          
##     P-Value [Acc > NIR] : 0.330026        
##                                           
##                   Kappa : 0.1586          
##  Mcnemar's Test P-Value : 0.002334        
## 
## Statistics by Class:
## 
##                      Class: Low Class: Medium Class: High
## Sensitivity              0.8169        0.3125    0.100000
## Specificity              0.4310        0.7407    0.991597
## Pos Pred Value           0.6374        0.4167    0.500000
## Neg Pred Value           0.6579        0.6452    0.929134
## Prevalence               0.5504        0.3721    0.077519
## Detection Rate           0.4496        0.1163    0.007752
## Detection Prevalence     0.7054        0.2791    0.015504
## Balanced Accuracy        0.6240        0.5266    0.545798
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction Low Medium High
##     Low     53     31    1
##     Medium  16      5    3
##     High     2     12    6
## 
## Overall Statistics
##                                           
##                Accuracy : 0.4961          
##                  95% CI : (0.4069, 0.5855)
##     No Information Rate : 0.5504          
##     P-Value [Acc > NIR] : 0.90755         
##                                           
##                   Kappa : 0.0939          
##  Mcnemar's Test P-Value : 0.01462         
## 
## Statistics by Class:
## 
##                      Class: Low Class: Medium Class: High
## Sensitivity              0.7465       0.10417     0.60000
## Specificity              0.4483       0.76543     0.88235
## Pos Pred Value           0.6235       0.20833     0.30000
## Neg Pred Value           0.5909       0.59048     0.96330
## Prevalence               0.5504       0.37209     0.07752
## Detection Rate           0.4109       0.03876     0.04651
## Detection Prevalence     0.6589       0.18605     0.15504
## Balanced Accuracy        0.5974       0.43480     0.74118

There is still quite a bit of work that needs to be done to make this a more useful model. I am going to focus on feature engineering by exploring the relationships between features more closely. If you have any ideas about how this model can be improved, please post them on the discussion board. Thank you!